Amortized Bayesian Meta-Learning with Accelerated Gradient Descent Steps
نویسندگان
چکیده
Recent meta-learning models often learn priors from observed tasks using a network optimized via stochastic gradient descent (SGD), which usually takes more training steps to convergence. In this paper, we propose an accelerated Bayesian structure with inference (ABML-SIN). The proposed model aims solve the procedure of improve speed and efficiency. Current approaches hardly converge within few steps, owing small number samples. Therefore, introduce learning based on teacher–student architecture meta-latent variable θt for task t. With amortized fast network, meta-learner is able task-specific latent steps; thus, it improves meta-learner. To refine variables generated transductive amortization meta-learner, SIN—followed by conventional SGD-optimized network—is introduced as student–teacher online-update parameters. SIN extracts local accelerates convergence network. Our experiments simulation data demonstrate that method provides generalization scalability unseen samples, produces competitive/superior uncertainty estimations few-shot two widely adopted 2D datasets fewer epochs compared state-of-the-art approaches. Furthermore, parameters act perturbations weights, enhancing probability accelerating efficiency Extensive qualitative show our performs well across different in both simulated real-world circumstances.
منابع مشابه
Learning to Draw Samples with Amortized Stein Variational Gradient Descent
We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference. Our method is based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient direction (Liu & Wang, 2016) that maximally decreases the KL divergence with the target distribution. Our method work...
متن کاملAmortized Analysis on Asynchronous Gradient Descent
Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically, gradient descent has been a sequential and synchronous process. Distributed and asynchronous variants of gradient descent have been studied since the 1980s, and they have been experiencing a resurgence due to demand from large-scale machine learning problems running on multi-core process...
متن کاملAccelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
Nesterov's accelerated gradient descent (AGD), an instance of the general family of"momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior to GD in the nonconvex setting remains open. This paper studies a simple variant of AGD, and shows that it escapes saddle points and finds a second-order stat...
متن کاملAsynchronous Accelerated Stochastic Gradient Descent
Stochastic gradient descent (SGD) is a widely used optimization algorithm in machine learning. In order to accelerate the convergence of SGD, a few advanced techniques have been developed in recent years, including variance reduction, stochastic coordinate sampling, and Nesterov’s acceleration method. Furthermore, in order to improve the training speed and/or leverage larger-scale training data...
متن کاملAccelerated Gradient Descent by Factor-Centering Decomposition
Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network’s gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simpli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2023
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app13158653